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L  INTRODUCTION 

One  technique  for  producing  nonparametric  estimates  of  a  sample’s  underlying  pro¬ 
bability  density  function  and  its  associated  cumulative  distribution  function  is  kernel 
estimators.  A  kernel  density  estimate,  f(z),  is  defined  for  a  sample  with  values 
Xt,X2,  .  .  .  ,  Xq  at  each  fixed  point  z  as  follows: 


where 


^z)  >0  for  all  z 


/  f(z)d2=l 


where  the  kernel  function  K  is  a  probability  density  function  symmetric  about  zero  and 
the  data-bassd  smoothing  parameter  h  is  strictly  positive.'  A  kernel  estimate  of  a 
sample’s  underlying  cumulative  distribution  function,  F(z),  can  be  computed  by 
integrating  f(z)  with  the  kernel  function’s  proper  limits  of  integration. 

Kernel  estimators,  however,  are  plagued  with  problems  ranging  from  choosing  an 
appropriate  kernel  estimator,  assessing  the  robustness  of  a  kernel  estimator  and  deter¬ 
mining  the  optimum  value  for  the  kernel  estimator’s  smoothing  parameter  given  a  par¬ 
ticular  sample.  Despite  these  limitations  kernel  density  estimators  should  not  be  over¬ 
looked  among  other  nonparametric  estimators  such  as  histograms. 

The  purpose  of  this  paper  is  to  present  an  introduction  to  kernel  estimators  by  exa¬ 
mining  the  Gaussian  and  Lognormal  kernels.  These  kernel  estimators  will  be  applied  to 
two  .Artillery  Target  Intelligence  message  (ATIs)  samples  (Tables  I  and  2)  from  the 
Fire  Support  Team  (FIST)  Force  Development  Testing  and  Experimentation  D  (FDT&E 
n)  conducted  by  the  U.S.  Army  Field  Artillery  Board  at  Fort  Riley,  KS  during  .April 
and  May  1984.'  .Although  a  kernel  density  estimate  is  defined  as  a  normalized  density 
estimate,  the  figures  in  this  report  will  depict  unnormalized  density  estimates  based  on  a 
size  spacing,  z.  equal  to  one  tenth  of  a  second.  Work  with  these  samples  suggested 
modifying  both  the  Gaussian  and  Lognormal  kernels.  Subsequently,  these  modified  ker¬ 
nels  will  also  be  presented.  In  addition,  the  previously  mentioned  problems  associated 
with  kernel  estimation  will  be  examined  to  clarify  the  magnitude  of  their  infiuence  and 
to  show  the  usefulness  of  kernel  density  estimation. 


'Scott.  DftTid  >V  Aad  F&ctor.  Ljuette  E..  “Monte  Cnrlo  Stndj  of  Three  Dnt^Bnsed  NonpAnmetnc  Probability  D<*D5ttv 
Emmntora. '  /ovmW  of  iht  .Afftencea  SUUHtooi  At90€$4i*on,  roi  76.  no  373.  Mnrcb  1981,  p  10 

“K»ste.  VirfiBia  .  BrodeeB,  Doaglu  C  tad  Wiaaer.  Weady  “DtscnptiOB  of  the  Dijitii  Oua  Collected  from  the  FIST 
FDT£E  11.  "  Ballistic  Research  Laboratory  Techaical  Report  No  2679.  IS85 
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n.  GAUSSIAN  KERNEL 


Definition 

A  Gaussian  kernel  estimate  of  a  sample’s  underlying  probability  density  function 
uses  the  normal  probability  density  function  for  the  kernel  function  and  is  defined  as  fol¬ 
lows:’ 

For  z:  z  €  R,  -oo  <  z  <  oo 


1  “  I 

f(z)=-Lv-^< 

nh 


Since  the  normal  probability  density  function  is  a  unimodal  function  symmetric  about 
zero  and  defined  for  values  ranging  from  -oo  to  oc,  this  kernel  density  estimator  can  be 
applied  to  real-valued  samples  whose  sample  values  range  from  -oo  to  oo.  In  addition, 
this  kernel  will  produce  a  density  estimate  that  is  symmetric  about  each  sample  value. 
Appendix  A  contains  a  user’s  guide  for  the  FORTR*AN  programs  provided  in  .\ppen- 
dices  B  through  G.  Appendix  B  contains  the  FORTRAN  code  programming  the  Gaus¬ 
sian  kernel’s  estimate  of  a  sample's  underlying  probability  density  function  and  cumula¬ 
tive  distribution  function. 


Application  of  the  Gaussian  kernel  to  the  FIST  HQ  samples 

Figures  1  through  3  present  Gaussian  kernel  density  estimates  for  the  ATI  .\uto 
sample  superimposed  on  the  sample's  histogram  showing  the  number  of  service  time 
observations.  Each  of  these  estimates  was  produced  by  using  different  values  for  h.  the 
data-based  smoothing  parameter.  Figure  1  with  h  equal  to  2.00  seems  to  oversmooth 
this  FIST  HQ  service  time  sample.  In  this  report,  mode  refers  to  the  number  of  ‘local 
maxima"  in  a  density  estimate  and  differs  from  the  statistical  definition  of  mode(e.g., 
the  observation(s)  which  occur(s)  with  the  most  frequency).  Figure  2  with  h  equal  to 
1.00  produces  a  density  estimate  with  five  significant  modes  and  seems  to  give  an 
acceptable  density  estimate  for  this  data  sample.  In  Figure  3  with  h  equal  lo  0.50.  the 
density  estimate  has  seven  modes  and  seems  overly  sensitive. 

Figures  4  through  8  present  Gaussian  kernel  density  estimates  for  the  .\TI  Review 
sample  in  the  interval  [0,50]  for  h  equal  to  5.00,  3.00  and  1.00.  respectively.  Both  den¬ 
sity  estimates  in  Figures  4  and  5  seem  to  oversmooth  this  sample.  However,  the  kernel 
density  estimate  in  Figure  8  suggests  five  significant  modes  for  the  .\TI  Re^iew  sample 
in  the  interval  [O.oOj  and  seems  to  supply  an  acceptable  density  estimate  for  the  .\TI 

3 

Ttpia.  Richuil  A  *ad  Thompsoa.  James  R  .  .Vanfrwmtlru  F'lfisiMf  Denntf  Eiltmaticn.  Baltimore.  SID  The  Joha  Horrkiss 
l.  aiTersrtT  Press,  1978,  p  61 
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IM{>iiru  I.  Application  of  the  Gaussian  Kernel  with  h=2.(j0  to  ATI  Auto  Sample. 


GaussiAn  K*rn*l 


isian  Kernel  with  h=l.UO  to  ATI  Auto  Sample. 


Gaussian  Kernel  with  h=0.50  to  ATI  Auto  Sample 


Review  sample. 


From  Fignres  1  through  6,  one  can  see  that  the  value  chosen  for  h  varies  the  ker¬ 
nel  density  estimate.  To  understand  the  relationship  between  the  h  value  and  the 
number  of  modes  in  the  mtimate,  Table  3  presents  the  critical  h  values  for  the  FIST 
HQ  samples.  A  critical  h  value  is  the  smallest  h  value  producing  a  kernel  density  esti¬ 
mate  with  m  modes,  and  therefore,  h  values  smaller  than  the  critical  h  value  produce 
kernel  density  estimates  with  more  than  m  modes.  The  critical  h  values  in  Table  3 
were  obtained  by  interactively  selecting  h  values  that,  in  turn,  were  used  to  produce 
kernel  density  estimates. 

TABLE  3.  Gaussian  Kernel’s  Critical  h  Values 
for  the  FIST  HQ  Samples 


Sample 

Number  of  Modes 
^^nte^ayOjoO^^ 

Critical  h  Value 
to  two  decimal  olac^vs 

.ATI  .Auto 

1 

2 

3 

4 

5 

6 
m 

4 

10.01 

2.73 

1.85 

1.01 

1.00  <  h  <  1.01 
0.65 

0.50 

ATI  Review 

1 

7.05 

2 

4.08 

3 

2.60 

4 

1.94 

3 

l.OO 

6 

0.99  <  h  <  1.00 

7 

0.42 

From  Table  3  one  can  see  that  as  the  h  value  increases  the  smoothness  of  the  ker¬ 
nel  density  estimate  increases.  The  role  that  h  performs  for  kernel  estimation  is  analo¬ 
gous  to  the  role  that  the  (data-based)  interval  size  performs  for  a  histogram.  For  a  his¬ 
togram  the  general  rule  of  thumb  for  an  appropriate  interval  size  is  10  to  •_*•>  intervals 
covering  the  range  of  the  sample.^ 

Tapia  and  Thompson  have  developed  a  recursive  formula  for  determining  the  glo¬ 
bal  optimal  h  for  a  kernel  estimator  by  minimizing  the  integrated  mean  square 
errorfIMSE).  The  IMSE  and  the  recursive  formula  is  deflned  as  follows:’’ 


Gaunum,  I.  Md  Wilkt.  S  3,  Intreiueterf  Enftnttrinf  New  York:  Jobs  '.Viley  ft  Sons.  ISSS  need  bv  '.V 

lad  Factor.  Lyaette  E..  “Moate  Cario  Stady  of  Three  OatvBaeed  Noaparametne  Probability  Density  Estimators. '  .'jurnai  of  >ht 
.tmcncea  SItititui/  /twacMiMw,  eoi.  T6.  ao.  173,  March  Ittl.  p.  9. 

Tapia.  Richard  A.  aad  Thomptoa.  Jamei  R.,  .Vonparaiiictnc  PretetiMf  Dtntriif  CrtimaiMn,  Saltimore.  MD:  The  iobn  Hopkins 
Uairersity  Preei.  1978.  pp.  <8.59. 


IMSE(fl 


■■  ■  t  ■■  i;."  im  1  ■■  V 1  ,■  n* n:"  i^"  ■.■* ig 


IMSE{fl  =  /E[(f(z)  -f(z))=]dz 


•  1  ** 

[/lK(y)|=dy]  +  k'^l  f-^^dy  [/  ( I<'I(J) )  "dz] 


and 

-1 

2r+l 


c^iKW) 


where 


-i 

^f)  =  (/(f<^)(z))'dz) 


Thus,  each  successive  value  for  h,  h‘■*■^  is  computed  by  using  the  previous  h  value,  h\ 
contained  in  f^'*(z).  In  application,  f(z)  estimates  f(z).  The  characteristic  exponent,  r, 
must  be  determined  for  each  kernel  function  K.  and  Tapia  and  Thompson  report  that 
the  Gaussian  kernel’s  characteristic  exponent  is  2.* 

The  attractiveness  of  Tapia  and  Thompson’s  formula  is  that  it  is  tolerant  of 
extreme  guesses  for  the  initial  h  value.  More  importantly,  however.  Scott  and  Factor 
have  noted  that  choosing  the  initial  h  value  to  be  the  sample’s  range  guarantees  the 
convergence  of  this  recursive  formula  to  the  largest  optimal  solution.^  Unfortunately, 
this  optimizing  formula  occasionally  converges  to  0.  and  the  resulting  Dirac  spike  esti¬ 
mate  (optimum  h=0)  is  a  degenerate  estimate.*  However,  since  kernel  estimators  are 
not  generally  robust  against  unsatisfactory  values  for  the  data-based  smoothing  parame¬ 
ter,®  Tapia  and  Thompson's  recursive  formula  is  advantageous  since  it  minimizes  this 
problem.  .Appendix  C  contains  the  FORTRAN  code  programming  this  recursive  for¬ 
mula  for  the  Gaussian  kernel. 

Using  Tapia  and  Thompson’s  formula  on  the  ATI  .Auto  sample  produces  a  Dirac 
spike  estimate  (i.e.,  optimum  h=0).  But.  using  Tapia  and  Thompson's  formula  on  the 
•ATI  Review  sample  suggests  that  the  optimum  h  value  is  approximately  1.02,  and  this 


A 

Tapia.  Richard  and  Thompioa,  James  R  .  .VenperemeirK  FrottiiiUf  DtniUf  SiUmttitn.  Baltimore.  MD  The  John  Hopkins 
L'nirersiejr  Press.  1978.  p  SO. 
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optimum  Gaussian  kernel  density  estimate  in  Figure  7  “fits"  the  ATI  Review  sample’s 
histogram  well. 

Although  Figure  7  depicts  this  sample  with  five  significant  modes  m  the  interval 
[0,50]  (  and  thirteen  significant  modes  for  the  entire  range  of  the  sample),  kernel  estimar 
tors  cannot  take  into  account  all  the  theoretical  and  experimental  conditions  producing 
a  data  sample.  As  Parzen  noted,  after  a  reliable  kernel  density  estimate  for  a  sample  is 
obtained,  one  should  investigate  the  theoretical  and  experimental  causes  of  the  sug¬ 
gested  modes.  Parzen  suggests  that  this  inquiry  may  justify  adjusting  the  value  of  the 
smoothing  parameter  in  some  instances.^  For  example,  since  FIST  FDT&E  11  service 
times  were  only  resolved  to  the  nearest  second,  further  analysis  may  suggest  adjusting 
the  value  of  the  smoothing  parameter.  For  the  purposes  of  this  report,  analysis  to 
determine  the  optimum  h  values  based  on  theoretical  and  experimental  conditions  will 
not  be  undertaken  and  should  not  detract  from  the  a  priori  power  of  kernel  density  esti¬ 
mation. 

Monte  Carlo  aimnlations  for  the  Gansaism  kernel 

To  evaluate  the  performance  of  the  Gaussian  kernel  and  Tapia  &  Thompson’s 
recursive  formula,  .Vfonte  Carlo  simulations  were  conducted  from  the  following  distribu¬ 
tions:  a)  N(0,1),  the  standard  normal  distribution;  b)  0.5  N(-1.5,l)  +  0.5  N(1.5,i),  a  SO¬ 
SO  mixture  of  two  normal  distributions;  c)  an  F  distribution  with  (10.10)  degrees  of  free¬ 
dom;  d)  a  Gamma  distribution  with  f(x)  ss  x  *  exp(-x)  for  x  between  0  and  oc;  and  e)  an 
Exponential  distribution  with  parameter  (9  =  1.  In  order  to  evaluate  the  performance  of 
the  Gaussian  kernel  and  Tapia  &  Thompson’s  recursive  formula,  average  efficiency  was 
selected  and  was  computed  by  dividing  the  theoretical  IMSE  by  the  mean  IMSE. 
Results  for  Monte  Carlo  simulations  for  distributions  a  through  e  were  reported  by  both 
Tapia  &  Thompson,  and  Scott  &  Factor.  Analogous  to  their  simulations.*  twenty-five 
pseudo-random  samples  with  a  sample  size  of  a  were  generated  for  each  distribution. 
Table  4  presents  the  results  from  these  Monte  Carlo  simulations. 

The  results  for  the  first  three  distributions  in  Table  4  are  close  to  the  results 
reported  by  Scott  &  Factor,  and  Tapia  &  Thompson.*  With  the  exception  of  the 
Exponential  distribution,  the  mean  optimum  h  values  for  each  distribution  in  Table  4 
conform  relatively  well  with  the  theoretical  optimum  h  values.  As  the  sample  size,  n. 
increased,  the  range  of  the  optimum  fa  values  decreased.  As  anticipated,  the  average 
efficiency  increased  as  n  increased  for  the  first  five  distributions  and  was  best  for  Monte 
Carlo  simulations  from  the  N(0,1)  distribution.  The  latter  result  was  expected  since  the 
standard  normal  probability  density  function  is  used  for  the  Gaussian  kernel  function. 
Despite  the  reduced  average  efficiency  of  the  mixed  normal  distribution  in  comparison  to 
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TABLE  4.  Monte  Carlo  Simulations  Employing  the  Gaussian  Kernel 
(Each  row  represents  25  samples  of  size  n.) 
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the  standard  normal  distribution,  the  Gaussian  kernel  and  Tapia  &  Thompson's  formula 
are  reliable  for  this  mixed  normal  distribution. 

The  F.  Gamma  and  Exponential  distributions  in  Table  4  were  chosen  to  show  the 
performance  of  the  Gaussian  kernel  and  Tapia  &  Thompson's  formula  on  strictly  non¬ 
negative  samples.  The  results  for  the  F  and  Gamma  distributions  indicate  reliable  per¬ 
formance;  however,  the  Gaussian  kernel  and  Tapia  &  Thompson's  formula  show  reduced 
performance  for  the  Exponential  dbtribution  as  n  increases.  The  density  function  for  an 
E.xponential  distribution  with  parameter  S  =  I  is  concentrated  close  to  zero.  and.  there¬ 
fore.  it  is  not  surprising  that  the  symmetric  Gaussian  kernel  has  reduced  average 
efficiency. 


The  Monte  C&rlo  results  in  Table  4  support  Scott  and  Factor’s  statement  that 
extensive  numerical  work  has  shown  that  for  sample  sizes  as  small  as  twenty-five,  the 
mean  optimum  h  values  based  on  Tapia  &  Thompson's  formula  are  on  average  close  to 
the  theoretical  optimum  h  values’ 

Criticisms  of  the  Gaussian  kernel 

The  Gaussian  kernel  is  one  of  the  popular  unimodal  kernel  functions  that  is  sym¬ 
metric  about  zero,  and  it  is  now  known  that  many  symmetric  unimodal  kernel  functions 
are  nearly  optimal.^  Although  the  Gaussian  kernel  is  efficient  and  reliable.  Figures  1 
through  7  show  that  the  Gaussian  kernel  density  estimator  predicts  non-zero  densities 
for  service  times  less  than  zero  seconds.  Since  negative  service  times  are  theoretically 
and  experimentally  impossible,  one  can  conclude  that  the  Gaussian  kernel  may  not  be 
the  best  kernel  to  use  for  a  strictly  non-negative  sample  with  values  “close”  to  or  equal 
to  zero.  Therefore,  the  Gaussian  kernel  will  be  truncated  below  a  value  considered  to  be 
“too  close”  to  zero  in  order  to  “create”  a  non-symmetric  unimodal  kernel  function. 

m.  TRUNCATED  GAUSSIAN  KERNEL 


Definition 

For  strictly  non-negative  samples,  a  truncated  normal  curve  for  sample  values  lying 
below  .3»h  seemed  to  be  a  plausible  modification  to  the  accepted  Gaussian  kernel.  Tliis 
truncated  Gaussian  kernel  density  estimator  is  defined  as  follows; 

For  z:  z  €  R,  0  <  z  <  00 


where 
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Appendix  D  contains  the  FORTRAN  code  programming  the  truncated  Gaussian  kernel’s 
estimate  of  a  sample’s  underlying  probability  density  function  and  cumulative  distribu- 
tion  function. 


Application  of  the  truncated  Ganaaian  kernel  to  the  FIST  HQ  samples 

Table  5  presents  the  truncated  Gaussian  kernel’s  critical  h  values  for  the  FIST  HQ 
samples.  These  values  were  obtained  by  interactively  selecting  h  values  that,  in  turn, 
were  used  to  produce  kernel  density  estimates. 

TABLE  5.  Truncated  Gaussian  Kernel’s  Critical  h  Values 
for  the  FIST  HQ  Samples 


Sample 


Number  of  Modes 
in  interval  fO.-SOl 


1 


Critical  h  Value 
to  two  decimal  places 


9.81 

2.64 

1.8.5 

1.01 

1.00  <  h  <  1.01 
0.65 
0.50 


Figures  8  through  10  present  truncated  Gaussian  kernel  density  estimates  for  the 
.\TI  .\uto  sample  using  h  equal  to  2.00,  1.00  and  0.50,  respectively.  The  density  esti¬ 
mate  in  Figure  0  with  a  smoothing  parameter  equal  to  1.00  seems  to  be  a  good 


TruncAt«d  Gaussian  Karnal 


A|)|)l  ii  ation  of  tho  Truncated  f.aussian  Kernel  wit»>  h=0.50  to  ATI  Auto  Sample. 


estimate  of  the  underlying  probability  density  function  and  predicts  the  same  modes  as 
the  Gaussian  kernel  density  estimate  in  Figure  2.  The  only  main  difference  between 
these  two  density  estimates  occurs  in  the  region  below  zero  seconds. 


Figures  11  through  13  present  truncated  Gaussian  kernel  density  estimates  for 
the  ATI  Review  sample  with  h  equal  to  5.00,  3.00  and  1.00,  respectively.  The  density 
estimate  in  Figure  13  with  the  smoothing  parameter  equal  to  1.00  seems  to  be  a  good 
density  estimate. 

Appendix  E  contains  the  FORTRAN  code  programming  Tapia  and  Thompson’s 
optimizing  h  formula  for  the  truncated  Gaussian  kernel.  Using  this  programming  code 
on  the  ATI  Auto  sample  suggested  that  the  optimum  h  value  is  approximately  0.62. 
Figure  14  compares  the  truncated  Gaussian  kernel  estimate  for  the  ATI  Auto  sample 
using  this  h  value  and  the  histogram  for  this  sample.  Using  Tapia  and  Thompson’s 
optimizing  h  formula  on  the  ATI  Review  sample  produced  a  Dirac  spike  estimate  when 
the  sample  range  was  used  for  the  initial  h  value;  however,  using  10.00,  5.00  and  1.00  as 
the  initial  h  values  each  produced  optimizing  h  values  approximately  equal  to  1.27. 
Although  Tapia  and  Thompson's  formula  converges  to  0  with  an  initial  h  value  equal  to 
the  sample’s  range,  other  initial  h  values  suggest  that  a  good  h  value  produces  a  density 
estimate  with  5  modes  in  the  interval  [0.30],  and  substantiates  the  earlier  claim  that  the 
density  estimate  in  Figure  13  with  a  smoothing  parameter  equal  to  1.00  is  a  reasonable 
density  estimate  for  the  ATI  Review  sample. 


Monte  Carlo  simulations  for  the  truncated  Gaussian  kernel 

To  evaluate  the  performance  of  the  truncated  Gaussian  kernel  and  Tapia  & 
Thompson’s  recursive  formula,  Monte  Carlo  simulations  were  conducted  from  the  same 
distributions  as  the  Monte  Carlo  simulations  for  the  Gaussian  kernel.  Analogous  to  the 
Gaussian  kernel  simulations(Table  4),  twenty>&ve  pseudo-random  samples,  each  with  a 
sample  size  of  n,  were  generated  from  these  distributions  and  the  h  values  associated 
with  these  samples  were  calculated.  Table  8  presents  the  results  from  these  Monte 
Carlo  simulations. 


Table  8  reflects  the  difficulties  that  were  associated  with  measuring  the  average 
efficiency  of  these  Monte  Carlo  simulations.  For  instance,  although  the  value  for  a  (K) 
of  Tapia  &  Thompson’s  formula  can  be  evaluated  for  the  truncated  Gaussian  kernel,  its 


two  different  values,  — ^  for  Xi<3*h  and  z  >  0,  and  — ^  for  Xi>3*h  and  z  >  0,  inter- 
v3r  2v7r 

fere  with  the  evaluation  of  the  the  theoretical  optimum  h  values,  IMSE  [f]  values  and 
subsequently,  the  average  efficiency  values.  Thus,  Table  8  presents  the  theoretical 
ranges  for  the  optimum  h  value  and  IMSE(f].  The  lower  and  upper  limits  of  these 
ranges  were  computed  by  using  the  two  different  o(K)  values.  The  h  value  associated 
with  each  sample  of  size  n,  however,  was  computed  by  using  the  a(K)  value  that  was 
correct  for  each  sample  value  x-,. 
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TABLE  6.  Monte  Carlo  Simulations  Employing  the  Truncated  Gaussian  Kernel 
(Each  row  represents  25  samples  of  size  n.) 
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Although  the  efficiency  range  in  Table  9  cannot  be  directly  compared  to  the  aver* 
age  efficiency  in  Table  4,  it  is  helpful  in  drawing  some  tentative  conclusions.  First, 
preliminary  comparisons  between  these  tables  suggest  that  as  expected,  the  truncated 
Gaussian  kernel  may  not  be  quite  as  efficient  as  the  Gaussian  kernel  for  normal  or 
mixed  normal  distributions.  Second,  the  source  of  the  alarmingly  low  efficiency  range  of 
the  F  distribution  with  (10.10)  degrees  of  freedom  in  Table  0  has  not  been  identified 
and  seems  counter  intuitive.  Third,  applying  the  truncated  Gaussian  kernel  to  Monte 
Carlo  simulations  from  the  Gamma  and  Exponential  distributions  would  appear  to 
result  in  efficiency  ranges  which  equal  or  surpass  the  average  efficiency  of  the  Gaussian 
kernel’s  simulations.  Additional  Monte  Carlo  simulations  would  be  helpful  in  determin¬ 
ing  the  performance  of  Tapia  &  Thompson’s  for  the  truncated  Gaussian  kernel  in  com¬ 
parison  to  the  Gaussian  kernel.  More  beneficial,  however,  would  be  a  more  sophisti¬ 
cated  procedure  for  computing  the  average  efficiency. 


Criticisms  of  the  truncated  Gaussian  kernel 


From  the  truncated  Gaussian  kernel  figures  for  the  FIST  HQ  samples,  it  is 
apparent  that  this  kernel  density  estimator  provides  a  probability  density  function  esti¬ 
mate  with  zero  density  for  all  negative  values.  This  property  conforms  perfectly  with 
the  theoretical  and  experimental  service  time  expectations  underlying  the  FIST  HQ  sam¬ 
ples.  Preliminary  Monte  Carlo  simulations  with  skewed  non-negative  distributions  sug¬ 
gested  that  the  use  of  Tapia  &  Thompson’s  formula  with  the  truncated  Gaussian  kernel 
was  more  efficient  than  the  Gaussian  kernel.  Thus,  this  supports  the  use  of  the  trun¬ 
cated  Gaussian  kernel  for  the  FIST  HQ  samples.  To  further  examine  these  FIST  HQ 
samples  with  another  non-symmetric,  unimodal  kernel  function,  it  was  decided  that  a 
Lognormal  kernel,  which  can  only  be  applied  to  strictly  positive  samples,  should  also  be 
developed. 

IV.  MODE  CENTERING  LOGNORMAL  KERNEL 


Definition 

.\  Lognormal  kernel  density  estimator  concentrating  its  kernel  estimate  around  the 
sample's  mode  is  defined  as  follows: 


For  z:  z  <  R,  0  <  z  <  00 


h 


Unlike  the  normal  distribution  which  is  symmetric,  the  Lognormal  distribution  is 
skewed  and  can  only  be  evaluated  for  strictly  positive  values.  Appendix  F  contains  the 
FORTRAN  code  programming  the  mode  centering  Lognormal  kernel’s  estimate  of  a 
sample’s  underlying  probability  density  function  and  cumulative  distribution  function. 


Application  of  the  mode  centering  Lognormal  kernel  to  the  FIST  HQ  samples 

Due  to  the  definition  of  the  Lognormal  distribution,  it  was  necessary  to  modify  the 
zero  second  FIST  HQ  service  times.  Although  FIST  HQ  service  times  cannot  be 
theoretically  less  than  or  equal  to  zero  seconds,  FIST  HQ  service  times  were  only 
resolved  to  the  nearest  second.  Thus,  a  zero  second  service  time  is  actually  a  service 
time  of  less  than  one  second.  Therefore,  the  interpretation  of  zero  second  service  times 
as  half  second  service  times  seems  plausible. 

Table  7  presents  the  mode  centering  Lognormal  kernel’s  critical  h  values  for  the 
FIST  HQ  samples.  Comparing  these  values  with  the  critical  h  values  for  the  Gaussian 
and  truncated  Gaussian  kernels  in  Tables  3  and  S,  one  can  see  the  higher  sensitivity  of 
the  smoothing  parameter  value  for  the  mode  centering  Lognormal  kernel.  These  values 


were  obtained  by  interactively  selecting  h  values  that,  in  turn,  were  used  to  produce 
kernel  density  estimates. 


TABLE  7.  Mode  Centering  Lognormal  Kernel’s  Critical  h  Values 
for  the  FIST  HQ  Samples 


Sample 

Number  of  Modes 
in  interval  [0.50| 

Critical  h  Value 
to  two  decimal  places 

-ATI  Auto 

1 

0.38 

2 

0.35 

3 

0.30 

4 

0.24 

5 

0.20 

8 

0.18 

7 

0.14 

.ATI  Review 

1 

0.42 

2 

0.38 

3 

0.34 

4 

0.30 

5 

0.25 

6 

0.10 

1 

0.07 

Figures  IS  through  17  present  kernel  density  estimates  for  the  ATI  .Auto  sample 
with  h  equal  to  0.50,  0.25  and  0.22,  respectively.  These  figures  show  that  the  applica¬ 
tion  of  the  mode  centering  Lognormal  kernel  to  this  sample  results  in  a  very  peaked 
density  estimate  around  a  half  of  a  second  and  attributes  an  unnoticeable  density  esti¬ 
mate  to  the  tail  service  time  observation  at  forty-four  seconds. 

Figures  18  through  20  present  kernel  density  estimates  for  the  .ATI  Review  sam¬ 
ple  with  h  equal  to  0.50,  0.25  and  0.20,  respectively.  Comparing  the  mode  centering 
Lognormal  kernel  estimate  in  Figure  19  with  the  truncated  Gaussian  kernel  estimate  in 
Figure  13,  one  can  see  an  apparent  difference  between  these  two  density  estimates. 
For  instance,  the  truncated  Gaussian  kernel  attributes  a  single  mode  to  the  service  time 
observations  between  0  and  5  seconds;  whereas,  the  mode  centering  Lognormal  .kernel 
attributes  three  modes  to  these  service  time  observations.  .Also,  the  truncated  Gaussian 
kernel  attributes  three  significant  modes  to  the  service  time  observations  beyond  15 
seconds;  whereas,  the  mode  centering  Lognormal  kernel  “smears"  these  observations 
together. 

Criticisma  of  the  mode  centering  Lognormnl  kernel 

.Applying  the  mode  centering  Lognormal  kernel  to  the  FIST  HQ  samples  resulted  in 
density  estimates  that  minimized  the  effect  of  tail  observations.  In  addition,  the  modal 
nature  of  the  density  estimates  tend  to  emphasize  those  observations  closest  to  zero. 
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.ognorroul  Kernel  with  h=0.22  to  ATI  Auto  Sample. 


Centering  Lognormal  Kerne]  with  h=0.25  to  ATI  Review  Sample. 


Thus,  the  estimated  underlying  probability  density  functions  associated  with  the  mode 
centering  Lognormal  kernel  differ  markedly  from  those  obtained  from  the  Gaussian  and 
truncated  Gaussian  kernels.  The  high  sensitivity  of  the  smoothing  parameter  for  the 
FIST  HQ  samples  to  small  changes  could  hinder  the  selection  of  an  optimum  smoothing 
parameter  value,  and  it  was  decided  that  it  would  not  be  worthwhile  to  apply  Tapia  & 
Thompson’s  formula  and  to  conduct  Monte  Carlo  simulations.  Instead,  the  median 
centering  Lognormal  kernel  was  developed. 

V.  MEDIAN  CENTERING  LOGNORMAL  KERNEL 


Definition 

A  Lognormal  kernel  density  estimator  concentrating  its  kernel  estimate  around  the 
sample’s  median  is  defined  as  follows: 

For  2:z«R,  0<2<oo 


0  1  I 

f(2)=S^« 


where 


y~log,  i(l+v'l+(2h/Xj)-) 


.■\ppendix  G  contains  the  FORTRAN  code  programming  the  median  centering  Lognor¬ 
mal  kernel’s  estimate  of  a  sample’s  underlying  probability  density  function  and  cumula¬ 
tive  distribution  function. 

Application  of  the  median  centering  Lognormal  kernel  to  the  FIST  HQ  samr 
pies 

Table  8  presents  the  median  centering  Lognormal  kernel’s  critical  h  values  for  the 
FIST  HQ  samples.  These  values  were  obtained  by  interactively  selecting  h  values  that, 
in  turn,  were  used  to  produce  kernel  density  estimates. 

Figures  21  through  23  present  kernel  density  estimates  for  the  ATI  Auto  sample 
with  h  equal  to  2.00,  0.95,  and  0.50,  respectively.  Comparison  of  the  median  centering 
Lognorm^  density  estimate  in  Figure  23  to  the  truncated  Gaussian  density  estimate  in 
Figure  14  shows  that  these  two  kemeb  produce  similar  results  considering  the  interpre¬ 
tation  of  zero  second  service  times  to  half  second  service  times.  Therefore,  one  can  con¬ 
clude  that  the  .\TI  Auto  sample's  optimum  h  value  for  the  median  centering  Lognormal 
kernel  lies  around  0.50. 


. .  -  .•  .-  .  - .  .  • 
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l  igiiie  23.  Application  of  the  Median  Ceiiterint*  l-ognormal  Kernel  with  h=0.50 


TABLE  8.  Median  Centehns  Lognormal  Kernel’s  Critical  b  Values 
for  the  FIST  HQ  Samples 


Figures  24  through  20  present  kernel  density  estimates  for  the  ATI  Review  sam¬ 
ple  with  h  equal  to  5.00,  2.00  and  0.95,  respectively.  The  density  estimate  with  h  equal 
to  2.00  seems  to  be  a  good  density  estimate  and  can  be  compared  to  the  truncated 
Gaussian  density  estimate  in  Figure  13.  Comparing  the  oversmoothed  median  center¬ 
ing  Lognormal  density  estimate  in  Figure  24  with  the  oversmoothed  truncated  Gaus¬ 
sian  kernel  density  estimate  in  Figure  11.  however,  shows  that  the  median  centering 
Lognormal  kernel  produces  a  concave  shaped  density  estimate,  whereas,  the  truncated 
Gaussian  kernel  produces  a  convex  shaped  density  estimate. 

Criticisms  of  the  median  centering  Lognormal  kernel 

From  the  application  of  the  median  centering  Lognormal  kernel  to  the  FIST  HQ 
samples  one  can  see  that  the  median  centering  Lognormal  kernel  produces  density  esti¬ 
mates  that  are  very  similar  to  the  ones  of  the  truncated  Gaussian  kernel.  Since  many 
symmetric  unimodal  kernel  functions  are  nearly  optimal,  this  suggests  that,  perhaps, 
many  non-symmetric  unimodal  kernels  are  also  nearly  optimal. 

VI.  CONCLUSIONS 

.\n  introduction  to  kernel  density  estimation  was  achieved  by  examining  the  appli¬ 
cation  of  four  different  kerneb-the  Gaussian,  truncated  Gaussian,  mode  centering  Log¬ 
normal  and  median  centering  Lognormal  kernels— to  two  FIST  FDT&E  Q  samples.  This 
study  revealed  that  each  kernel  function  produced  slightly  different  nonparametric  esti¬ 
mates  of  each  sample's  underlying  probability  density  function.  .Monte  Carlo  simula¬ 
tions  with  the  Gaussian  kernel  showed  that  the  average  efficiency  of  Tapia  and 
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Thompson's  formula  for  this  kernel  for  sample  sizes  as  small  as  twenty-five  was  quite 
good.  However,  the  principle  limitation  of  the  Gaussian  kernel  was  the  prediction  of 
non- zero  densities  for  negative  service  time  observations.  The  truncated  Gaussian  ker¬ 
nel  corrected  this  limitation,  and  preliminary  Monte  Carlo  simulations  suggest  that  care¬ 
ful  selection  between  these  two  kernels  for  a  particular  sample  can  increase  average 
efficiency.  Applying  the  median  centering  Lognormal  kernel  to  the  FIST  HQ  samples 
resulted  in  nonparametric  density  estimates  that  were  very  similar  to  the  truncated 
Gaussian  kernel.  The  mode  centering  Lognormal  kernel  produced  markedly  different 
density  estimates  than  the  three  other  kernel  functions  by  supplying  density  estimates 
that  minimized  the  effect  of  tail  observations  and  emphasized  the  observations  closest  to 
zero.  Tapia  &  Thompson’s  recursive  formula,  which  optimizes  the  smoothing  parameter 
for  a  sample,  enhanced  the  robustness  of  Gaussian  and  truncated  Gaussian  kemeb  and 
provided  a  quick  and  efficient  starting  point  for  selecting  a  sample’s  smoothing  parame¬ 
ter  value.  Additional  analysis  of  the  theoretical  and  experimental  causes  of  the  modes 
suggested  by  these  optimum  kernel  density  estimates  should  be  undertaken  to  accept 
the  optimum  h  value  or  justify  a  modification  of  the  optimum  h  value. 

From  the  application  of  these  four  diilerent  kernel  functions  to  the  FIST  HQ  sam¬ 
ples,  it  was  shown  that  the  truncated  Gaussian  and  median  centering  Lognormal  kernels 
seem  to  be  superior  to  the  Gaussian  and  mode  centering  Lognormal  kernels  for  these 
samples.  The  principle  reasons  are  due  to  the  following  respective  inefficiencies  of  the 
latter  kernels:  (1)  the  prediction  of  non-zero  service  time  observations,  and  (2)  the  sensi¬ 
tivity  of  the  smoothing  parameter  to  small  changes.  The  optimizing  kernel  density  esti¬ 
mates  for  the  truncated  Gaussian  and  medi.in  centering  Lognormal  kernel  suggested 
seven  modes  for  the  ATI  auto  sample  and  five  modes  for  the  ATI  review  sample  in  the 
interval  [0,50].  The  Gaussian  kernel’s  optimum  density  estimates  also  were  consistent 
with  these  results.  By  re-examining  the  histograms  accompanying  the  unnormalized 
density  estimates,  one  can  “see”  these  predicted  modes  and  observe  the  relative  smooth¬ 
ness  of  kernel  density  estimates  in  comparison  to  histograms. 

In  conclusion,  this  paper  showed  that  kernel  density  estimation  can  be  useful  in 
producing  nonparametric  estimates  of  a  sample’s  underlying  probability  density  function 
and  emphasizes  the  importance  of  choosing  the  appropriate  kernel  function  and  smooth¬ 
ing  parameter  value  for  a  particular  sample. 
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APPENDIX  A.  A  USER’S  GUIDE  TO  KERNEL  PROGRAMS 


The  purpose  of  this  appeaduc  is  to  assist  the  individual  who  desires  to  use  the  pro¬ 
gramming  codes  presented  in  Appendices  B  through  G.  These  programs  are  currently 
available  on  the  U.S.  Army  Ballistic  Research  Laboratory’s  C'VBER  machine!  BRL- 
CY'BER.ARPA).  The  following  instructions  describe  the  files  required  to  run  these  pro¬ 
grams  and  the  specific  commands  required  for  the  forementioned  CYBER.  In  addition, 
the  instructions  for  a  curve  plotting  program  using  DISSPLA,  Graphics  Software  Pack¬ 
age  are  also  given. 
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PREVIOUS  PAGE 
S  BLANK 
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INSTRUCTIONS  FOR  GAUSSIAN,  TRUNCATED  GAUSSIAN,  MODE 
CENTERING  LOGNORMAL  AND  MEDIAN  CENTERING  LOGNORMAL 

KERNELS 

STEP  1. 


The  user  must  create  and  store  a  MFA  permanent  file.  This  file  should  con¬ 
tain  the  information  listed  in  Table  A-1. 

TABLE  A*l.  Data  File  for  Gaussian,  Truncated  Gaussian. 

Mode  Centering  Lognormal  and  Median  Centering  Lognormal  Kerntd 

(Note:  (PL)  denotes  program  limitation  and 
*  indicates  free-format  read  statements. ) 


DESCRIPTION 

LINE(S) 

FORMAT 

COLUMNS 

1.  N— the  sample  size 
(PL)  0  <  N  <  1000  (PL) 

1 

no 

1-10  - 

2.  X(I)— an  array  of  N 
elements  containing 
the  sample  values 

2-1001 

REAL 

* 

STEP  2. 

To  run  these  programs  in  lAF  type  the  following: 
BEGIN,PRCKERN,PRCKERN,(1),PRGM=(2) 
where 

(1)  =  file  uame  for  the  file  created  in  STEP  1. 

(2)  =  one  of  the  following  program  file  names 

GAUSS  {for  Gaussian  kernel} 

TRUNCG  {for  Truncated  Gaussian  kernel) 
MODLOG  {for  Mode  Centering  Lognormal  kernel) 
MEDLOG  {for  Median  Centering  Lognormal  kernel) 
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The  user  should  answer  the  interactive  prompts  appropriately  and  is  warned 
that  a  crash  will  occur  if  prompts  are  answered  inappropriately.  The  user  is 
also  cautioned  that  if  either  permanent  or  local  ^es  named  PDF  or  CDF 
exist  prior  to  entering  the  BEGIN  command,  an  error  133  message  will  occur 
during  execution.  If  this  occurs,  the  user  must  remove  these  files  and  repeat 
STEP  2. 

STEP  3. 

After  properly  completing  STEP  the  user  has  two  local  and  two  per¬ 
manent  files  called  PDF  and  CDF.  File  PDF  contains  the  normalized  proba¬ 
bility  density  function,  and  file  CDF  contains  the  cumulative  distribution 
function.  To  obtain  a  graphical  presentation  of  these  files  the  user  should 
consult  INSTRUCTIONS  FOR  CURVT;  PLOTTING  PROGR.4M  USING 
DISSPLA  GRAPfflCS  SOFTWARE  PACK.\GE. 


INSTRUCTIONS  FOR  THE  OPTIMIZING  H  PROCEDURES  FOR  THE 
GAUSSIAN  AND  TRUNCATED  GAUSSIAN  KERNELS 


STEP  1. 

See  STEP  1  INSTRUCTIONS  FOR  GAUSSIAN,  TRUNCATED  GAUS¬ 
SIAN,  MODE  CENTERING  LOGNORMAL  ANT>  MEDIAN  CENTERING 
LOGNORMAL  KERNELS. 


STEP  2. 


To  rua  these  programs  in  lAF  type  the  following: 
BEGIN,PRCOPTH,PRCOPTH,(  1),PRGM=(2) 
where 

(1)  =  file  name  for  the  file  created  in  STEP  1. 

(2)  =  one  of  the  following  program  file  names 

GAUSSH  {for  optimizing  h  procedure  for 
*  the  Gaussian  kernel}  . 

TRUNCGH  {for  optimizing  h  procedure  for 
the  truncated  Gaussian  kernel} 


The  user  should  answer  the  interactive  prompts  appropriately  and  is  warned 
that  a  crash  will  occur  if  prompts  are  answered  inappropriately.  The 
optimum  h  value  is  displayed  on  the  screen  during  the  execution  of  this  pro- 


INSTRUCTIONS  FOR  CURVE  PLOTTING  PROGRAM 
USING  DISSPLA  GRAPfflCS  SOFTWARE  PACKAGE 


STEP  1. 

This  program  was  designed  for  Tektronics  4014  terminal  or  a  Hewlett  Pack- 
ett  2648  terminal,  and  subsequently,  can  only  be  used  in  conjunction  with 
these  two  types  of  terminals. 

STEP  2. 


The  user  must  create  and  store  a  MFA  permanent  file.  This  file  should  con¬ 
tain  the  information  listed  in  Table  A-2.  Users  may  have  to  alter  the  call 
to  initialize  DISSPLA  output  to  their  terminal,  and  users  unfamiliar  with 
DISSPLA  are  encouraged  to  consult  a  DISSPLA  manual. 

TABLE  A-2.  Data  File  for  Curve  Plotting  Program 

(NOTE;  (PL)  denotes  program  limitation  and 
*  indicates  free-format  read  statements.) 


DESCRIPTION 

LINE 

TYPE/FORMAT 

COLUMNS 

1.  T-(tc  type  of  termiaal 
bcitf  and  ao  (ollowi: 

1  for  TKMM.  3  for  HP3S4* 

aad  aay  other  iatcfcr  will 
will  (enniaato  the  profram. 

T“ 

e 

3.  NPT^he  aamber  of  x  aad  j 

pain  to  be  plotted 

2 

iateger 

• 

y  PAGEX,  PAGEY-the  pa«e  >ixo 

la  width  aad  leagth  ia  lachet 

3 

real 

• 

4.  AXISX.  .AXISY-the  iibplot 

area'r  width  ud  leo(th  la 

iachee 

4 

nil 

• 

5.  LHl— plot  hcadiaf 
eacloaod  ia  liafle  qaotoa 

s 

character 

1-80  (PL) 

S.  HlNUM-total  #  of  chatactera. 

i 

iQteicer 

. 

I  lettcn,  ipacci,  (ymbolf  is  | 

I  LHl  (e.».  7  for  TITLE’)  ! 

I  0  <  HINUM  <  M  (PL)  j 


la  aicr'j  «ai(« 


21.  XMAX-x  oaximam  xalaa  la  aaer’a 


a  art! 


STEP  3. 

The  user  should  hare  an  MFA  permanent  normalized  probability  density 
function  file  (i.e.,  PDF)  and/or  an  MFA  permanent  cumulative  distribution 
function  file  (i.e.,  CDF)  created  by  running  one  of  the  programs  mentioned  in 
STEP  2's  INSTRUCTIONS  FOR  GAUSSIAN,  TRLWATED  GAUSSIAN, 
MODE  CENTERING  LOGNORMAL  AND  MEDIAN  CENTERING  LOG¬ 
NORMAL  KERNELS. 


STEP  4. 

To  run  these  programs  in  lAF  using  a  Hev/lett  Packard  2648  terminal  or  a 
g  Tektronics  4014  terminal  type  the  follo^ving: 

BEGIN,PRCDSPL,PRCDSPL,(1),(2) 

where 

(1)  =  fil«  name  for  the  file  created  in  STEP  2. 

(2)  =  file  name  for  "PDF”  or  "CDF"  file  mentioned  in  STEP  3. 

The  DISSPLA  plot  will  be  created  on  the  terminal  screen.  To  return  to  the 
■  lAF  prompt  after  the  plot  is  drawn,  hit  the  terminal’s  return  key. 
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APPENDIX  B.  GAUSSIAN  KERNEL 


GAUSSIAN  KERNEL 

PURPOSE:  THIS  PROGRAM  COM»UTES  THE  E^^»IRICAL  PROBABILITY  DENSITY 

FUNCTION  AND  ITS  ASSOCIATED  CUMLlATIVE  DISTRIBUTICW  FUNCTION 
FOR  THE  GAUSSIAN  KERNEL.  THIS  .VETHOD  IS  DESIGNED  FOR 
DATA  SA^PLES  TRAT  HAVE  SA^PLE  VALLES  R,\NGING  FROM 
NEGATIVE  INFINITY  TO  POSITIVE  INFINITY. 

FILES:  THE  FOLLOWING  DESCRIBES  THE  INFORMATICS  REQUIRED  BY  EACH 

FILE. 

DATA:  ON  THE  FIRST  LINE  VL'ST  BE  N,  THE  SIZE  OF 

THE  SAI^LE.  CS  THE  FOLLOWING  LINES  OF 
THE  FILE  SHOLTD  BE  THE  SAVfLE  VALLES  ONE 
PER  LINE. 

PDF:  THE  NORMALIZED  PROBABILITY  DENSITY 

FUNCTION 

CDF:  THE  CUMULATIVE  DISTRIBUTION  FLJNCTION 


PROGRAM  GAUS S  ( DATA ,  PDF  ,  CDF .  INPUT ,  OUTPUT ) 

SPECIFICATION  AND  DATA  STATEI^ENTS 
INTEGER  I  ,  K,  N 

REAL  ARG.  BASE,  BOTTOM,  F,  FHAT,  H,  MAX,  MIN,  STEP.  X,  Z 
DLVENSION  X(  1000) 

DIVENSION  IRAY(6) 

DATA  IRAY/6*  -0/ 

SET  PRINT  LIMIT  TO  ZERO,  AND  CALL  SYSTEM:  TO  INHIBIT  PRINTING 
OF  ERROR  115 


IRAY(4)=0 

CALL  S YSTENC (115,  IRAY ) 
INTTI.ALIZING  THE  FILES 


PREVIOUS  PAGE 
IS  BLANK 


OPEN(  UN  IT-1.  FI  LEW’ DATA'  ,  STATU^’Ca.D’  ) 

REWIND  1 

OPEN( UNIT-2  ,  F ILEW’  PDF  ’  .  STATUS—’ NEW’  ) 

REWIND  2 

OPEN{ l’NIT-3  .  FILE-’  CDF  ’  .  STA-PUS-’  NEW’  ) 

REIVIND  3 

OPEN( UNIT-5,  FILE-’  INPUT’ ) 

OPEN( UNIT— 6  .  FILE-' OUTPUT ’  ) 

READING  INFORMATICS  FRCM  THE  DATA  FILE 

READ( 1.100)  N 

100  FORMAT(IIO) 

MIN  —  I.OEIO 
MAX  —  1 .  OE-  10 
DO  10  I— l.N 

READ(1.«)  X(I) 

IF  (  X(  I )  .GT.  MAX  )  TOEN 
MAX  -  X(  I) 

END  IF 

IF  (  X(I)  .LT.  MIN  )  •niEN 
MIN  -  X(  I  ) 

END  IF 
10  CONTINUE 

DETERMINING  USER  REQUIRENCNTS 

WRITEIO.’I”  GAUSSIAN  KERNEL" )’ ) 

WRITE(8. 101 )  MIN 

101  FORMAT  I 'THE  SA^f»LE  VALUES  R.ANGE  FRO^  ’.F15.10) 

WRITE(8. 102)  MAX 

102  FORMAT!  ”rO  ’ .F15. 10. ’ .  INPUT  THE  FIRST  POINT" )  ’ ) 

WRITE!  8,  ’  i  'AT  WHICH  THE  DENSITY  ESTIMATE  WILL"  )  '  ) 

WRITE!  8, '!  "BE  a»-PUTED .  "  )  ’  ) 

READ!5.*)  BOTTOM 

WRI-TE!  6.1"  INPUT  THE  LAST  POINT  AT  WHICH"  )  ’  ) 

WRITE!  6.1  "THE  DENS  ITY  ESTIMATE  WILL  BE  CONFUTED.  '  )  '  ) 

20  READ!  5  .  •  )  TOP 

IF  !TOP  LT.  BOTTCM)  THEN 

WRITE!  8  ,  ’  !  •’IHIS  LAST  VALUE  MUST  EXCEED  THE  FIRST  VALUE.  "  )  ’  ) 
WRITE ! 8, ’! "PLEASE  INPUT  A  NEW  VALUE .")’ ) 

GOTO  20 
END  IF 

WRITE!  6  ,  •  !  "  IM’UT  THE  SIZE  SPACING  .AT  WHICH  KERNEL  DENS  ITY"  )  ) 
WRITE!  8,  •(  "ESTIMATES  ARE  DESIRED.  FOR  EX.ANPLE.  IF  THE’ )  ) 
WRITE! 6! "RANGE  FOR  THE  SANPLE  IS  50.0  THEN  A  SIZE  SPACING")  ) 
WRITE!  8.  I  "EQITAL  TO  0.  10  \NOtED  PRODUCE  DENSITY  ESTIMATES"  )  ’  ) 


30 


WRITEIO, ’ ('at  O.l,  0.2,  0.3 . 50.0.)")') 

READ(5,*)  STEP 
IF  (STEP  .LE.  0)  THEN 

WRITE(8.  ’  ("SIZE  SPACING  ML'ST  BE  GREATER  THAN  0."  )  ’  ) 

WRITE(6, ’ ("PLEASE  INPUT  A  NEW  VALUE." ) ’ ) 

GOTO  30 
END  IF 

WRITE(  6,  ’  (  "  INPUT  A  VALUE  FC«  H.  THE  SMOOTHING  PARAHETER.  "  )  ’  ) 

40  REA0(5.'*)  H 

IF  (H  .LE.  0)  THEN 

WRITE(  6  ,  •  (  "H  NCST  BE  GREATER  THAN  0  .  "  )  ’  ) 

WRITE(«  .  '  (  "PLEASE  INPUT  A  NEW  VALUE.  "  )  ’  ) 

GOTO  40 
END  IF 

•  CALCUTATING  AND  STORING  F,  THE  NORMALIZED  PROBABILITY  DENSITY  FU^NCTION 

DO  50  7*  BOTTOM.  TOP,  STEP 
F»0. 

DO  60  I—  l.N 

F  -  F  +  EXP(-0.5«((2-X(I))/H)«*2) 

60  CONTINUE 

F  -  F/SQRT(2. *3. 1416)/H/FL0AT(N) 

WRITE(2,103)  Z,  F 
103  FORMAT(2F15.lO) 

SO  CONTINUE 

•  .NU\ERICALLY  CALCUXATING  AND  STORING  F.  THE  CUMULATIVE  DISTRIBUTION 

•  FUNCTION 

IF  (BOTTOM  .LT.  MIN)  THEN 
BASE  -  BOTTOM 
ELSE 

BASE  -  MIN 
END  IF 

:o  FHAT  —  0. 

DO  80  I=l  .N 

ARC  (BASE  -  X(  I  )  )/H 
FHAT  =■  FH.\T  -»■  CSND(/\RG) 

80  CONTINU’E 

IF  (FHAT  .GT.  O.Ol)  THEN 
BASE  —  BASE  -  I . 

GOTO  70 
END  IF 

DO  90  Z-  BASE .  TOP ,  STEP 
FH\T-0 . 

DO  no  I—  1  .N 


ABG  *  (Z-X(I))/H 
FHAT  -  FHAT  -t-  CSND(ARG) 
no  CXNTINUE 

IF  (Z  GT.  BOTTCM)  TOEN 
WRITE(3.103)  Z,  FHAT/FLOAT(N) 

EM>  IF 

go  CONTINUE 
END 

THE  FOLLOWING  FUNCTION  APPROXIMATES  ’I«E  IInTTEGRAL  FROU  NEGATiVE 
INFINITY  TO  Y  OF  ( 1/SQRT(2*PI ) )  EXP( - l*(T**2)/2)  DT  ( I .E. .  TH 
CLM^EATIVE  STANDARD  NORMAL  DISTRIBUTION). 

FUNCTION  CSND(X) 

Y  -  ABS(X) 

P-l  ( ( ( (S.383E-06*Y+4.88906E-05)*Y+3.80036E.05)*Y+ 

&  0 . 003 2776263 )«Y+0. 021 14 1008 1 )*Y+0. 0498673460 )*Y 

P-0. 5«(P+l . )••( -16) 

IF  (X  err.  0.  )  THEN 
CSND-l  ,  -P 
ELSE 
eSND-P 
E>®  IF 
RETURN 


APPENDIX  C.  OPmnZING  H  PROCEDURE  FOR  THE  GAUSSIAN  KERNEL 


OPTIMIZING  H  PROCEDURE  FOR  G'.USSIAN  KERNEL 

PITIPOSE;  THE  PLRPOSE  OF  THIS  PROGRAM  IS  TO  OBTAIN  A  SATISFACTORY  V.\LUE 
FOR  THE  GAUSSIAN  KERNEL’S  H  VALLE. 

FILES:  BELCW  IS  A  BRIEF  SYNOPSIS  OF  THE  EXPECTED  CONTENTS  FOR  THE  FILE 

DATA. 

DATA:  ON  THE  FIRST  LINE  OF  THIS  FILE  SHOULD  BE  THE  NUVfiER 
OF  SAMPLE  VALLES  CGNTAINED  IN  THIS  FILE.  ON  THE 
FOLLOWING  LINES  SHOLED  BE  THE  SA\PLE  VALLES  ONE  PER 
LINE. 


PROGRAM  GAUS  SH ( DATA ,  I NPLT ,  OUTPUT ) 

•  SPECIFICATION  .AND  DATA  STATEVENTS 

INTEGER  I .  N.  TOTAL 

R^:AL  .ALPHA,  BETA,  C.  DIFF,  H,  NEVVH,  X 
DIXENSION  X|  1000) 

DIMENSION  IRAY(8) 

DATA  IRAY/8*  -0/ 

•  SET  PRINT  LIMIT  TO  ZERO.  .AND  CALL  SYSTENC  TO  INHIBIT  PRl.NTING 

•  OF  ERROR  115 


IRAY(-l)=iM) 

CALL  SYSTEM:|  115,  IR.AY) 

•  OPENING  THE  FILES 

OPEN|L'NIT»l  .FILE-DATA'  .  STATUS— ’  OLD  '  ) 
REWI.-n©  1 

OPEN(L-NIT-o.FILE-’  INPUT’  ) 

OPEN!  L’N  I T-6  ,  F I LE-  OUTPUT  ’  ) 

•  SETTING  THE  CWST.ANTS . 

.ALPHA  -  ((1./(2.*SQRT(3.1416)))«*(0.  :)) 
TOTAL  —  1000 

•  READING  FROM  THE  FILE  DATA 


PREVIOUS  PAGE 
IS  BLANK 


CAL^  ULTES(X,N.H.DIFF) 


•  E\f»LOYING  THE  OPTIMIZING  PROCEDURE  OLTLIf^  BY  RICHARD  A.  TAPIA  AND 

•  JAVES  R.  THOMPSON  IN  THEIR  BOOK  NONP.ARA\ETR I C  PROBABILITY  DENSITY 

•  ESTIMATICW,  BALTIMMIE,  M):  THE  JOIN  HOPKINS  UNIVERSITY  PRESS,  1978, 

•  P.  59. 

VWITE( 6 , ’ ( -GUESS  FOR  H; - ) ’ ) 

VWITE(6, 100)  H 

100  F0RMAT(F1S.5) 

VRITE(8,  ’  (  ’ . '  )  ’  ) 

C  -  .V(-0.2) 

DO  10  I»  1,  TOTAL 

CALL  BETAF(X,  BETA,  N,  H) 

NEWH  —  C  •  ALPHA  •  BETA 
IF  (NEWH  LT.  0.01)  THEN 
V«ITE(6, '(-H:-)’ ) 

WRITE(6, • (’DIRAC  SPIKE’)’) 

GOTO  20 
END  IF 

IF  (  ABS(NEV\H-H)  .LT.  DIFF  )  THEN 
WRITE(6. 101 )  DIFF 

101  FORMAT!  6. ’.AN  .ACCEPTABLE  H  WITHIN  ’  .Flo.  10.  •  IS’) 

\ARITE(6,  100)  NB\H 

GOTO  20 
END  IF 

WRITE(6, ’ ( -H: - ) ’ ) 

\\RITE(6  .  100)  NEV\« 

H  »  NEVVH 
10  CONTINUE 

20  END 


•  THIS  SL’BROUNTINE  READS  THE  VALVES  FROM  THE  FILE  DATA  .AND  I.NPLT. 

SUBROUTINE  V.ALUES  (X.N.H.DIFF) 

RE.AL  DIFF,  H,  MIN,  MAX 
DINENSION  X(  1000) 

PE.\D(  I  .  •  )  N 
MAX  =  I  .  OE  - 1 0 
MIN  =  1  .  OEIO 
DO  30  1=  1  .  N 
READ!  1 .  •  )  X(  I  ) 

IF  (X(  I  )  GT.  MAX)  THEN 


MAX  -  X(  I  ) 

END  IF 

IF  (X(I)  .LT.  MIN)  THEN 
MIN  =»  X(  I ) 

END  IF 
COiVTINUE 

V\RITE(6,  ’  {’  OPTIMIZING  H  VALLT:  FOR  THE  GAUSSIAN  KERNEL”)  ’  ) 
VW?ITE(6, 

WRITE) 6, ’ ("CHOOSING  THE  INITIAL  H  VALLT:  TO  BE  THE  SAMPLE" ) ’  ) 
WRITE! 6,  ’  ("RANGE  GUARANTEES  THE  CaWERGENCE  OF  THIS  ITERA-"  )  ’  ) 
WRITE)  6.  •  (  "TIVE  SEQLENCE  TO  THE  LARGEST  NON-NEGLATIVT:'  ) '  ) 

WRITE (6, 102)  MAX  -  MIN 

FORMAT) 'SOLITION.  THE  RANGE  OF  THIS  SA\PLE  IS  ’ ,F15. 10, ' . ’ ) 
WRITE)  6,  ’  (  "  INPUT  A  VALUE  FOR  THE  INITIAL  H  V.ALUE.  *  )  ’  ) 

READ(5.«)  H 

WRITE)  6  ,  ■  (  "  INPUT  THE  MINIVtM  ACCEPTABLE  DIFFERENCE  BETWEEN"  )  ’  ) 
\1RITE)  6  ,  ’  (  "THE  SUCCESSIVE  H  VALLES  .  A  RECCHA€NDED  DIFFER-  '  )  ’  ) 
WRITE(8, ■ ("ENCE  IS  l.OE-5")’) 

RELAD(5.*)  DIFF 
END 


•THIS  SLEROLTINE  .APPROXIMATES  TAPIA  AND  THOcPSON'S  BETA  V.ALLE 

•  BY  USING  THE  FORMXA  OUTLINED  BY  SCOTT  AND  F.ACTOR  IN  THE  JOURN,\L 

•  OF  THE  A^ERIC.AN  STATISTICAL  .ASSOCIATION.  VOL.  76,  NO.  373,  MARCH 

•  1981,  P.  10. 

SUBROUTINE  BETAF(X,BET.A.N.H) 

REAL  HF .  HS 
DIMENSION  X( 1000) 

A  —  1  .  /  1 2 . 

HS  »  H«*2 
HF  =  HS»*2 
BETA  =  0. 

DO  40  J»l,N 
DO  50  K  -  1  ,N 
Y  -  X)  J)  -  X)K) 

Z  =-  EXP(-(Y«*2)/(4.*HS)) 

BETA  =.  BETA  +  ( HF  -  ((Y**2)‘HS)  +  (.AMV*«4))  )*Z 
50  CONTI NLE 

40  CONTI. ME 

BETA  =  ( BETA* ( 3 . / ( 8 . • SQRT) 3.1416) •N*N» ( K«  *9 )  )  )  )  •  « ( -  1  / 5 .  ) 


APPENDIX  D.  TRUNCATED  GAUSSIAN  KERNEL 


TOUNCATED  GAUSSIAN  KERNEL 

PURPOSE:  THIS  PROGRAM  GOM>UTES  THE  NORMU^IZED  PROBABILITY 

DENSITY  FUNCTION  AND  ITS  ASSOCIATED  CUMJLATIVE  DISTRIBUTION 
FUNCTION  BY  USING  A  TRUNCATED  GAUSSIAN  KERNEL  FOR  THOSE 
DATA  SA\PLE  V.\LLT:S  LYING  BELOW  3 . ‘H  A^©  THE  TYPICAL  GAUSSIAN 
KERNEL  V.ALUES  FOR  THE  DATA  SAVPLE  VALUES  ABOVE  3.«H.  THIS 
DESIGN  IS  CHOSEN  FC«  DATA  SAM»LES  THAT  HAVE  NON-NEGATIVE 
SA^PLE  VALUES  AND  CAN  THEORETICALLY  ONLY  HAVE  NON-NEGATIVE 
VALUES . 

FILES:  THE  FOLLOWING  DESCRIBES  THE  INFORMATION  IN  EACH  FILE. 

DATA:  CW  THE  FIRST  LINE  MUST  BE  THE  SIZE  OF  THE 

SA\f*LE.  N.  ON  THE  FOLLOWING  LINES  OF  TOE 
FILE  SHOULD  BE  THE  SAM»LE  VALUES  ONE  PER 
LINE. 

PDF:  THE  NORMALIZED  PROBABILITY  DENSITY 

FUNCTION 

CDF:  THE  CUMULATIVE  DISTRIBUTION  FUNCTION 


PROGRAM  TRUNCG(  DATA .  PDF .  CDF ,  I.NPUT .  OLTPIT ) 
•  SPECIFICATION  AND  DATA  STATEMENTS 


INTEGER  I ,  K.  N 

REAL  .ARG,  BASE,  BOTTCM,  F,  H,  .MAX,  MIN,  STEP,  TOP,  X  ,Z 
DIMENSION  X(  1000,2) 

DIMENSION  IRAY(6) 

DATA  IRAY/a*  -0/ 

•  SET  PRINT  LIMIT  TO  ZERO,  AND  CALL  SYSTEMC  TO  INHIBIT  PRINTING 

•  OF  ERROR  115 


lRAY(4)-0 

CALL  SYSTEMC( 115, IRAY) 

•  INITIALIZING  THE  FILES 

OPEN(UNIT-l ,  FILE-  DATA’  ,  STATUS-  OLD '  ) 
REWIND  1 


PREVIOUS  PAGE 
IS  BLANK 


77 


a»EN(l)NIT-2, FILE?-’ PDF’  . STATUS-’ NEW’  ) 

REWIND  2 

Ca*EW(  UNIT-3,  FILE?-’ CDF’  .STATUS-’ NEW’  ) 

REWIM)  3 

OPEN(L’NIT— 5,FILB-’  INPUT*  ) 
a>EN(UNITi^.FILE5-’ OUTPUT’  ) 

READING  INFORMATION  FRCM  THE  INPUT  FILES  AND  CALCULATING  X(I,2) 

\MIITE(6,  ’  (’  TRUNCATED  GAUSS  IAN  KERNEL*  )’ ) 

VWITE(8.  ’("  *)’  ) 

V«ITE(  6  ,  ’  (  ■  INPUT  H.  THE  SNCOTHING  PARANETER.  *  )  ’  ) 

0  READ(S,*)  H 

IF  (H  .LE.  0)  THEN 

)MIITE( 6  .  ’  (  "H  MJST  BE  GREATER  THAN  0 .  *  )  ’  ) 

VWITE ( 6. ’ (-PLEASE  INPUT  A  NEW  VALUE. ")’ ) 

GOTO  10 
END  IF 

READ(1, 101)  N 
01  FORMAT(IIO) 

MAX  -  l.OE-10 
DO  20  I-1,N 

READ(1..)  X(I,1) 

IF  (  X(I,1)  ,GT.  3.*H  )  THEN 
X(1.2)  -  1. 

IF  (  X(I.l)  .GT.  MAX)  THEN 
MAX  -  X(  1 ,  1 ) 

END  IF 
ELSE 

ARG  -  X(I,1)/H 
X(  1,2)  -  CSND(ARG) 

END  IF 
0  CONTINUE 

DETERMINING  USER  REQUIRENENTS 


WRITE! 6, 102)  MIN 

FORMAT! ’THE  SAM»LE  VALUES  R.ANGE  FROM  ’.F15.10) 
WRITE!8, 103)  MAX 

FORMAT! 'TO  ’, FIS. 10, ’ .  INPUT  THE  FIRST’ ) 

WRITE!  6,  ’  ("NON- NEGATIVE  POINT  .AT  WHICH  THE  DENSITY* 
WRITE!  8,  ’  (  "ESTIMATE  WILL  BE  <X»«’UTED.  "  )  ’  ) 

READ!5.*)  BOTTOM 
IF  (BOTTCM  .LT.  0)  THEN 

WRITE!  6,  ’  ("THIS  VALUE  MJST  BE  NON-NEGATI VE .  "  )  '  ) 
WRITE !  8,  '! -PLEASE  INPUT  A  NEW  VALUE .")’ ) 


GOTO  30 
END  IF 

WRITE(  0 .  '  (  •  II^OT  TOE  LAST  NON-^EGATIVE  POINT  AT  WHICH”  )  ’  ) 

WRITEI «,  ’  i  "TOE  DENSITY  ESTINttTE  WILL  BE  C(M»l)TED.  ”  )  ’ ) 

40  READ(5,«)  TOP 

IF  (TOP  .LT.  BOTTCM)  TOEN 

VWITE(8,  ’  ("THIS  LAST  VALUE  MUST  EXCEED  TOE  FIRST  VALLE.  ”  )  ’  ) 

WRITE («,’ ("PLEASE  INPUT  A  NEW  VALLE.  ”)’ ) 

GOTO  40 
E^2}  IF 

WRITE(«,  ■  ("INPUT  THE  SIZE  SPACING  AT  WHICH  KERNEL  DENSITY")  ’  ) 

WR  nE(  6,  ’  ("ESTIMATES  ARE  DESIRED.  FOR  EXA\PLE,  IF  THE"  )  ’  ) 

V\RITE(6,  ’  ( "RANGE  FOR  THE  SAVPLE  IS  50.0  THEN  A  SIZE  SPACING”)’) 

WRITE)  0,  ’  ("EQUAL  TO  0. 10  WOULD  PRODUCE  MNSITY  ESTIMATES”  )  ’  ) 

WRITE(0,  '  ("AT  0. 1.  0.2,  0.3 . 50.0.)”)’) 

50  READ) 5,*)  STEP 

IF  (STEP  .LE.  0)  THEN 

\'VRITE(0,  ’  ("SIZE  SPACING  MUST  BE  GREATER  THAN  0.")’  ) 

WRITE ( 6. ’ ("PLEASE  INPUT  A  NEW  VALUE. ”)’ ) 

GOTO  50 
END  IF 

•  CALCULATING  AM)  STORING  r,  THE  NCRMKLIZED  PROBABILITY  DENSITY  FUNCTION 

DO  80  Z»  BOTTCM,  TOP,  STEP 
F-0. 

DO  70  I-  1,N 

F-  F  +  EXP(-0.5«((Z.X(I,1))/H)**2)/X(I,2) 

70  CONTINUE 

F  »  F/SQRT(2. *3. 1 4 16 ) /H/FLOAT(N) 

WRITE) 2, 104)  Z,  F 
104  FORMAT) 2F15. 10) 

60  CONTINUE 

•  NU^ERICALLY  CALCULATING  AND  STORING  F,  THE  CUMULATIVE  DISTRIBUTION  FL74CTION 

IF  (BOTTCM  .LT.  MIN)  THEN 
BASE  -  BOTTOM 
ELSE 

BASE  -  MIN 
END  IF 

80  FHAT  -  0  . 

DO  90  I-1,N 

ARG  »  (BASE  -  X( I, 1))/H 
FHAT  -  FHAT  +  CSND(.ARG) 

90  CONTI.NLE 

IF  (FH.AT  GT.  0.01)  THEN 


IF  (  (EASE-l.)  .LT.  0.)  HIEN 
BASE  —  0. 

QOTO  100 
ELSE 

BASE  —  BASE  •  1 . 

GOTO  80 
EM)  IF 
EM)  IF 

100  DO  110  ^  BASE.  TOP,  STEP 
F-O. 

DO  120  I*  l.N 

ARC  -  (Z-X( I, 1) )/H 
IF  (X(I,2)  .NE.  1)  THEN 
F  -  F  +  ((CSM)(ARG).l)/X(I,2))  +  1 
ELSE 

F  -  F  +  CSND(ARG) 

END  IF 

120  COMTINfE 

IF  (Z  .GT.  BOTTCM)  THEN 
WRITE(3.104)  Z.  F/FLOAT(N) 

END  IF 

110  CONTINUE 
END 

THE  FOLLOWING  FUNCTION  APPROXIMATES  THE  INTECSlAL  FROM  NEGATIVE 
INFINITY  TO  Y  OF  ( 1/SQRT( 2*PI ) )  EXP( -  1 •(T**2 )/2 )  DT  (I.E.,  THE 
aMT-ATIVE  STANDARD  NORMAL  DISTRIBUTION). 

FUNCTION  CSND(X) 

Y  -  ABS(X) 

P-(  (  (  ( (5.383E-06*Y+4.88906E-05)»Y+3.80038E-05)*Y+ 

&  0 .0032778263 )*Y+0. 021 14 10081 )*Y+0. 0498873469 )*V 

P^.5*(P+l.  )««(-l8) 

IF  (X  .GT.  0. )  THEN 
CSND-1 . -P 
ELSE 
CSMM» 

EM)  IF 
RETURN 
EM) 


.\PPENDIX  E.  OPTIMIZING  H  PROCEDURE  FOR  THE  TRUNCATED  GAUSSIAN  KERNEL 


OPTIMIZING  H  PROCEDURE  FOR  TRUNCATED  GAUSS L-VN  KERNEL 

PURPOSE:  THE  PUWOSE  OF  THIS  PROG31AVI  IS  TO  C»TAIN  A  SATISFACTORY  VALUE 
FOR  THE  TRUNCATED  GAUSSIAN  KERNEL  ’  S  H.  THE  RECURSIVE  NETHOD 
EVPLOYED  IS  OUTLINED  BY  RICHARD  A.  TA\XOR  .AND  JA\ES  R.  THCKPSOM 
ON  PP.  57-67  OF  THEIR  BOtK  NDNP ARAACTR I C  PRCfiABILITY  DENSITY 
ESTIMATI<»i  (BALTINCRE,  ^0;  THE  JOHN  HOPKINS  UNIVERSITY  PRESS, 
1978)  . 

FILES:  BELOW  IS  A  BRIEF  SYNOPSIS  OF  THE  EXPECTED  CCXVTENTS  FOR  THE 

FILE  DATA. 

DATA:  ON  THE  FIRST  LINE  OF  THIS  FILE  SHOULD  BE  THE  .MJVBER 
OF  S.\M»LE  VALLES  CCSVTAINED  IN  THIS  FILE.  OS  THE 
FOLLOWING  LINES  SHOLID  BE  THE  SAH^LE  V.ALUES  O.NE  PER 
LINE. 


PROGRAM  TRLTCCH  ( DATA .  I NPUT ,  OUTPUT ) 

•  SPECIFICATION  AND  DATA  STATO-ENTS 

INTEGER  K,  N,  TOTAL 

REAL  .AB,  BOTTOM.  C.  DIFF.  F,  H.  NEVAH,  STEP,  TOP,  Z 
DI\ENS ION  X( 1000) ,  Z(IOOO),  F(IOOO) 

DI\ENSICW  IR\Y(6) 

DATA  IRAY/6*  -0/ 

•  SET  PRINT  LIMIT  TO  ZERO.  AND  CALL  SYSTEMC  TO  INHIBIT  PRINTING 

•  OF  ERROR  115 

IRAY( 4  j-O 

CALL  SYSTEMC( 115, IRAY) 

•  OPENING  THE  FILES 

OPEN(UNIT- 1,  FILE^-'DATA’ .  STATUS='OLD’ ) 

REWIND  1 

OPEN(UNIT»5,  FILE-’ INPIT' ) 

OPEN( UNIT-6,  F I LE-' OUTPUT •  ) 

•  SETTING  THE  CmST.ANTS 

TOTAL  —  50 


READING  FROM  WE  FILES 


CALL  VALUES (X.N.H.K, BOTTOM. TOP, STEP, DIFF) 

EXPLOYING  TOE  OPTIMIZING  H  PROCEDURE  OUTLINED  BY  RICHARD  A.  TAPIA 
.AND  JA^ES  R.  TOO^PSON  IN  THEIR  BOOK  NONPARA\ETRIC  PROBABILITY  DENSITY 
ESTIMATION,  BALTIMDRE,  M):  THE  JOHN  HOPKINS  UNIVERSITY  PRESS.  1978, 

P.  59. 

WRITE! « , ’ ( ’GUESS  FC»  H: ’ ) ’ ) 

WRITE(S,10l)  H 
101  FORMAT(FIS.S) 

C  -  N*«(.0.2) 

DO  10  I-  1.  TOTAL 

call  ABF(X.  Z.  F.  AB.  N.  H.  K.  bottom.  TOP,  STEP) 

NEWH  -  C  •  AB 

IF  (  ABS(NEWH.H)  .LT.  DIFF  )  THEN 
WRITE(6. 100)  DIFF 

100  FORMAT! 'AN  ACCEPTABLE  H  WITHIN  ’  .FI. 3.  10,  ’  IS’) 

WRITE!6, 101)  NEWH 
GOTO  20 
END  IF 

IF  !NEWH  .LT.  DIFF)  THEN 
WRITE! 6, ’! ’DIRAC  SPIKE")’) 

GOTO  20 
END  IF 

WRITE!8, ’ !’H: ’ ) ’ ) 

WRITE! 6. 101 )  NEWH 
H  «  NEWH 
10  CONTINUE 

20  END 


•  THIS  SUEROl?JTINE  READS  THE  VALLES  FROM  THE  FILE  DATA  .AND  INPUT. 

SUBROUT INE  VALUES !  X .  N ,  H . K,  BOTTOM,  TOP ,  STEP  .  D I FF  ) 

REAL  MIN.  .MAX 
DI^ENSlCW  X!  1000) 

READ! 1,102)  N 
102  FORMAT!  1 10) 

MIN  »  1  .  OEIO 


103 


DO  30  I-  1,  N 
READ(1,103)  X(I) 

FORMiVT(FlO.O) 

IF  (  X(I)  .GT.  MAX  )  THEN 
MAX  -  X(  I ) 

END  IF 

IF  (  X(I)  .LT.  MIN  )  THEN 
MIN  -  X(  I ) 

END  IF 
30  OCWTINUE 

IF  (  MIN  .LT.  0.)  THEN 
MIN  »  0. 

END  IF 

V«ITE(fl.  ’  (’OPTIMIZING  H  FC«  THE  TOLNCATED  GAUSSIAN  KERNEL’)’  ) 
MRITE(«, ’ (’  ’)•  ) 

VyRITE(8,  ’  (’CHOOSING  THE  INITIAL  H  V.ALUE  TO  BE  THE  SAAPLE’  )  ’  ) 
WRI TE( 6.  ’  (’RANGE  GUARANTEES  THE  CONVERGENCE  OF  THIS  ITERA-’)’) 
VWITE(0, ’ i’TIVE  SEQUENCE  TO  THE  LARGEST  NON-NEGATIVE’ ) ' ) 

WRITE(  8.  ’  (’SOLUTION.  THE  RANGE  OF  THIS  SAMPLE  IS’)’) 

\MIITE(8.«)  MVX  -  MIN 

V«ITE(8,  ’  (’INPUT  A  GUESS  FOR  H,  THE  SMXTIHING  PARAACTER .  ’  )  ’  ) 
READ(5,«)  H 

WITE(  8 ,  ’  (  "  INPUT  THE  MINIMUM  ACCEPTABLE  DIFFERENCE  BETV.EEN"  )  ’  ) 
WIITE( 8  ,  •  ( ’THE  SUCCESSIVE  H  VALUES  .  A  REC(M«NDED  DIFFER- ’  )  ’  ) 
\ARITE(8,  ’  (’ENCE  IS  l.OE-5’)’) 

READ(5,*)  DIFF 

WRITE(8.  (’THE  ITERATIVE  H  VALUES  ARE’ ) ’ ) 

V«ITE(8, ’(’  ’)’  ) 

BOTTOM  -  MIN 

TOP  »  MAX  +  3 .  ‘H 

STEP  -  (  TOP  -  BOTTCM  )/FLOAT(K) 

END 


THIS  SUBROUTINE  APPROXIMATES  T.APIA  AND  THCM’SCW’S  .ALPHA-BETA  V.ALIE 
(I.E..  AB). 

SUBROUTINE  ABF  (X,Z,F,AB.N,H,K,  BOTTCM,  TOP  .  STEP  ) 

DI\ENSICW  X(  1000)  .  Z(IOOO).  F(IOOO) 

.ALPHAl  =•  (0.7784)  ••(-  2. 5) 

ALPHA2  —  (0.89l8)**(-2.5) 

A1  -  .ALPHA1/(SQRT(2.*3.  1 4  16  )  •  (H*  *3  ) ‘N) 


A2  -  ALPHA2*SQRT(0. 5)/(S<^(3. 141fl)*N*(H**2)  ) 

I  -  1 

DO  40  n-  BOTTOM.  TOP,  STEP 
F(I)  -  0. 

Z(I)  -  R 
DO  50  J  —  l.N 

Y  -  (Z(l)  -  X(J))/H 
IF  (X(J)  .GE.  (3.«H))  THEM 
F( I)  »  F( I)  +  A1  •  EXP(-0.S*Y*Y)  •  (Y*Y  -  1. ) 

ELSE 

F( I)  »  F(I)  +  A2  •  EXP(-0.5*Y*Y)*  (Y*Y  -  1. )  •  2. 
END  IF 
CCMTINUE 
F(I)  -  F(I)*F(I) 

I  -  I  +  I 
CXX^INLE 

CALCULATING  THE  BETA  VALUE  OF  AB  BY  SINJ»SON’S  1/3  RULE  OVER 
SLPINTERV.US 

AB  >  0. 

S  -  STEP/3. 

.DO  60  I  -  1.  K-2.  2 

AB  »  AB  +  S  •  (F(I)  +  (4.*F(I+1))  +  F(I+2)) 

CWriNUE 


APPENDIX  F.  MODE  CENTERING  LOGNORMAL  KERNEL 


VDDE  CENTERING  LOGNORMIL  KERNEL 

PURPOSE:  THIS  PROGRAM  WILL  CO^f^JTE  THE  PROBABILITY  DENSITY 

FUNCTION  AND  ITS  ASSOCIATED  CIM.UAT1VE  DISTRIBUTION  FUNCTICN 
BY  USING  A  MW)E  CENTERING  LOCNCRMM.  KRRNEL .  DUE  TO  THE  M\THE- 
MATICAL  CWSTRAINTS  OF  THE  LOGNORMAL  KERNEL  .ALL  DATA  SAVPLE 
V.ALUES  MJST  BE  STRICTLY  POSITIVE. 

FILES:  BELOW  IS  A  BRIEF  DESCRIPTION  OF  E.ACH  FILE’S  EXPECTED  CCNTENTS  . 

DATA:  ON  THE  FIRST  LINE  OF  THIS  FILE  SHOLID  BE  THE 

SAKPLE  SIZE,  N.  ON  THE  FOLLOWING  LINES  SHOULD 
BE  THE  SAMPLE  V.ALUES  ONE  PER  LINE. 

PDF:  THE  NORMALIZED  PROB.ABILITY  DENSITY  FUNCTION 

CDF:  THE  CUM.UATIVE  DISTRIBUTION  FIN’CTION 


PROGRAM  VDDELOG( DATA , PDF . CDF ) 

•  SPECIFICATICW  .AND  DATA  STATEMENTS 

INTEGER  I.  J.  K,  N 

REAL  BOTTCM,  F,  FHAT,  H,  MAX.  MIN,  R,  S,  STEP,  TOP,  X,  Z 
DI\f:NS  ION  F(  1000) .  X(IOOO),  Z(IOOO) 

DliVENSION  IRAY|«) 

DATA  IRAY/6*  -0/ 

•  SET  PRINT  LIMIT  TO  ZERO.  .AND  C;J.L  SYSTEIcC  TO  INHIBIT  PRINTING 

•  OF  ERROR  115 

IRAY(4)»0 

CALL  SYSTEMS!  115,  IR.AY) 

•  OPENING  THE  FILES 

OPEN! UNIT-1  .FILE-’ DATA’  ,  ST.A'TUS-’OLD’  ) 

REWIND  1 

OPEN  !U'N  IT-2,  FILE- ’PDF’  ,  STA'TUS-’ NEW’  ) 

REWIND  2 

OPEN!  UNIT— 3,  FILE-’ CDF’  .  STATUS— ’  NEW’  ) 

REWIND  3 


READING  DATA  FROM  INPUT  FILES 


READ( 1,100)  N 
100  F0RMAT(I15) 

MAX  »  0. 

MIN  »  0 . 

DO  10  I— l.N 
READ(1.«)  X(I) 

IF  (  X(I)  .GTT.  M\X  )  THEN 
MAX  »  X(  I) 

ELSE 

IF  (  X(I)  .LT.  MIN)  THEN 
MIN  -  X(  I) 

END  IF 
END  IF 
10  CXDNTINUE 

DETERMINING  USER  REQUIRE^ENTS 

WRITE!  •,’  (  "THE  SAM»LE  VALLES  R.ANGE  FRONF  )  ’  ) 

WRITE!  •,•)  MIN 
WRITE!  *,  ’  !  "TO-  )  '  ) 

WRITE!*, •)  MAX 

WRITE! • . ' ! ' II^LT  THE  FIRST  NON-NEGATIVE  POINT  AT  WHICH" ) ’ ) 
WRITE!  *,’  !  "THE  DENS  ITY  ESTIMATE  WILL  BE  CCM>LTED.  "  )  ’  ) 
lo  READ!*,*)  BOTTOM 

IF  !BOTTCM  -LE.  0)  THEN 

WRITE!  •  .  '  (  "THIS  V/VLUE  MUST  BE  GREATER  THAN  ZERO.  -  )  ’  ) 

WRITE!  •  .  ’  (  "PLEASE  INPUT  A  NEW  VALUE.  "  )  ’  ) 

GOTO  15 
END  IF 

WRITE!  •  .  ’  1  "  INPUT  THE  LAST  NCW-NEGATIVE  POINT  .AT  WHICH"  )  ’  ) 

WRITE!  •  .  '  (  "THE  DENS  ITY  ESTIMATE  WILL  BE  CONFUTED.  "  )  ’  ) 

25  READ!*,*)  TOP 

IF  !TOP  LT.  BOTTCM)  THEN 

WRITE!  *  .  '  !  "THIS  LAST  VALUE  M.’ST  EXCEED  THE  FIRST  V.ALUE.  "  )  '  ) 
WRITE!  •  ,  '  i  "PLEASE  INPUT  A  NBV  V.ALLE.  "  )  ’  ) 

GOTO  25 
END  IF 

WRITE!  •  -  ’  !  "  IMPIJT  THE  SIZE  SPACING  AT  WHICH  KERNEL  DENS  ITY"  )  ’  ) 
WRITE!  *,  ’  !  "ESTIMATES  ARE  DESIRED.  FOR  EX.\.\FLE,  IF  THE"  )  '  ) 
WRITE!  I  "RANGE  FOR  THE  SAVPLE  IS  50.0  THEN  A  SIZE  SP.\t  ING')  ) 
WRITE!  *.  ’  (  "EQL^AL  to  0  .  10  \W5LLD  PRODUCE  DENSlTi'  ESTIMATES  •  !  ) 

WRITE!  *.’(  ".AT  0  .  1  ,  0-2  O.Z .  50  0.)")') 

.35  READ!*.*)  STEP 

IF  I  STEP  LE.  0)  THEN 

WRITE!  •  .  '  !  "S  IZE  SP.ACING  ML'ST  BE  GREATER  THAN  0  '  )  •  ) 


WRITE  (•.’ ("PLEASE  INPUT  A  NEW  VALUE. ’)’ ) 

GOTO  35 
END  IF 

WRITE!  •  ,  '  (  •  INPUT  A  VALUE  FOR  H.  THE  SMX)THING  PARANETER.  "  )  ’  ) 

5  READ(«.*)  H 

IF  (H  .LE.  0)  THEN 

WRITE!  •  .  ’  (  "H  iVrST  BE  GREATER  THAN  0  .  "  )  ’  ) 

WRITE! • . ’ ( "PLEASE  INPUT  A  NEW  VALUE. ” ) ’ ) 

GOTO  45 
EM>  IF 

CALOJLATING  AND  STORING  F,  THE  NC«5«LIZED  PROBABILITY  DENSITY  FUNCTION 
I  -  1 

DO  20  IV-  STEP,  TOP,  STEP 
F!I)  =M).0 
DO  30  J-  l.N 

A»  !LOG(R)-H*H.LOG!X(  J))  )**2 
F(I)-F!I)  +  EXP(-0.5*A/H**2) 

;0  CONTINUE 

F(I)  -  F( I)/(SQRT(2.*3. 1410)  •  H  •  R  •  FLOAT(N)  ) 

Z(I)  -  R 

WRITE!2,102)  Z(I),  F!l) 

102  FORMAT! 2F 15. 10) 

I  a*  I  +  I 
•0  CCOTINUE 


.NUVERICALLY  CALCT.'LATING  AND  STORING  FRAT.  THE  a\LLATlVE  DISTRIBUTION 
FUNCTION,  BY  USI.NG  SI^PSON’S  1/3  RUTE  OVER  2  SU'D INTERV.US 

K  -  INTilTOP  -  BOTTOM) /STEP) 

S  -  ABS(Z(K.l)/K)/3. 

FHAT  —0.0 

DO  40  I-  1,  K-2.  2 

FHAT  -  FHAT  +  (S  •  (F( I  )  +  (4'F! I+l ) )  +  F( 1+2) ) ) 

WRITE! 3. 102)  Z(  I  )  .  FRAT 
CONTINUE 


0 


.\PPENDIX  G.  MEDIAN  CENTERING  LOGNORMAL  KERNEL 


NCDIAN  CENTERING  LOCNORMiU.  KERNEL 

PURPOSE:  THIS  PROGRAM  WILL  (X»f»UTE  THE  PROBABILITY  DENSITY 

FUNCTION  AND  ITS  ASSOCIATED  CUMULATIVE  DISTRIBUTION  FUNCTION 
BY  USING  A  ^ED1AN  CENTERING  LOGNMUMSiL  KERNEL.  DUE  TO  THE 
MATHEMATICAL  CONSTRAINTS  OF  THE  LOGNCXIMAL  KERNEL  -ALL  DATA 
SAMPLE  VALLES  MUST  BE  STRICTLY  POSITIVE. 

FILES:  BELOW  IS  A  BRIEF  DESCRIPTION  OF  EACH  FILE’S  EXPECTED  CONTENTS. 

DATA:  ON  THE  FIRST  LINE  OF  THIS  FILE  SHOtED  BE  THE  S*AM»LE 

SIZE,  N.  CN  THE  FOLLOWING  LINES  SHOUED  BE  THE 
SAN«*LE  VALUES  ONE  PER  LINE. 

PDF:  THE  NORMALIZED  PROBABILITY  DENSITY  FUNCTION 

CDF:  THE  CUMLEATIVE  DISTRIBUTION  FUNCTION 


PROGRAM  VfDLOG ( DATA , PDF . CDF ) 

•  SPECIF ICATICN  .AND  DATA  ST.ATE^SNTS 

INTEGER  I.  J,  K,  N 

REAL  BOTTOM,  F.  FHAT.  H.  MAX,  MIN,  R,  S.  STEP,  TOP,  X,  Z 
DI\£NS ION  F( 1000) ,  X(1000.2).  Z(IOOO) 

DI\ENS10N  IRAY(6) 

DATA  IRAY/6*  -0/ 

•  SET  PRINT  LIMIT  TO  ZERO,  .AND  CALL  SYSTE\C  TO  INHIBIT  PRINTING 

•  OF  ERROR  115 

IRAY(4)-0 

CALL  SYSTEMUl 1 15. IRAY) 

•  OPENING  THE  FILES 

OPEN)  UNIT-1  .  F ILE-’ DATA  ’  ,  ST.ATU'S=«' OLD ’  ) 

REWIND  1 

OPEN)  UN  I  T»2,  FILE- ’PDF'  .  STATUS—’ NEW’  ) 

REWIND  2 

OPEN)UNIT-3,FILE='CDF’  ,  STATUS— '  NEW’  ) 

REWIND  3 


'.  -  • 


•  READING  DATA  FROM  THE  INPUT  FILES 

WRITE(  • ,  ’  (  "  INPUT  A  VALUE  FOR  H,  THE  SMDOTOING  PARAUfiTER.  "  )  ’  ) 

$  READ(«,«)  H 

IF  (H  .LE.  0)  THEN 

WRITE(  •,*("»  MUST  BE  GRELATER  THAN  0 .  "  )  ’  ) 

VWITE(  •  ,  ’  (  "PLEASE  INPUT  A  NEW  VALL^.  "  )  ’  ) 

GOTO  5 
END  IF 

RELAD(  1,100)  N 
100  FORMAT!  110) 

MAX  0.0 
MIN  -0.0 
DO  10  I— l.N 

READ(1,*)  X(I.l) 

IF  (  X( I . 1)  .GT.  MAX  )  THEN 
MAX  -  X( I , 1 ) 

ELSE 

IF  (  X(I,1)  .LT.  MIN)  THEN 
MIN  -  X( I . 1 ) 

END  IF 
END  IF 

X(  I  .  2)  -LOG(  .  5*(  l.+S(^T(  1  .+(2.  •H/X(  I  .  1  )  )**2)  )  ) 

10  CONTINUE 

•  DETERMINING  USER  REQUIRE^C^TS 

WRITE! • .  ’ ! "THE  SA^PLE  VALUES  RANGE  FRONF  ) ’  ) 

WRITE! •■•)  min 
WRITE!*, '!"TO”)’ ) 

WRITE! *.*)  MAX 

WRITE!  •  .  ’  (  '  INPUT  THE  FIRST  NON-.NEGATI\E  POINT  AT  WHiai"  )  ’  ) 
WRITE!  •  .  ’  !  "THE  DENS  ITY  ESTIMATE  WILL  BE  COAFUTED.  "  )  ’  ) 

15  READ!*,*)  bottom 

IF  !BOTTCM  .LE.  0)  THEN 

WRITE!  *  .  ’  (  "THIS  V.VLUE  MUST  BE  GPEATER  TH/VN  ZERO.  *  )  ) 

WRITE!  •  .  ’  (  "PLEASE  INPUT  A  NEW  VALUE .  '  )  '  ) 

GOTO  15 
END  IF 

WRITE!  •  .  ’  (  "  INPUT  THE  LAST  POINT  .AT  WHIOT"  )  '  ) 

WRITE!  •  -  ’  (  "THE  DENS  ITY  ESTIMATE  WILL  BE  CmPUTED.  *  )  '  ) 

25  RELAD!  •  ,  *  )  TOP 

IF  !TOP  .LT.  BOTTOM)  THEN 

WRITE!  •  .  ’  !  "THIS  LAST  VALUE  MUST  EXCEED  THE  FIRST  V.ALLE  "  )  '  ) 
WRITE!  •  .  ■  !  "PLE.VSE  INPUT  A  NEW  VALUE .  "  )  '  ) 

GOTO  25 
END  IF 


96 


VWITE(  •  ,  ’  (  •  INPUT  TOE  SIZE  SPACING  AT  WHICH  KERNEL  DENS ITY"  )  ’  ) 
VWITE(*,  ’  ("ESTIMATES  ARE  DESIRED.  FOR  EXA^PLE,  IF  TOE’)’) 
WRITE(*,  ("RANGE  FOR  TOE  SAMPLE  IS  50.0  TOEN  A  SIZE  SPACING")  ’  ) 
WBITE(  •  ,  ’  (  "EQUAL  TO  0 . 10  WOULD  PRODUCE  DENS  ITY  ESTIMATES"  )  ’  ) 

WRITE(*. ’ ("AT  0. I,  0.2,  0.3 . 50.0.)")’) 

35  READ(  •  ,  •  )  STEP 

IF  (STEP  .LE.  0)  THEN 

WRITE(‘,  ’  ("SIZE  SPACING  MUST  BE  CRE.\TER  THAN  0."  )  ’  ) 

WRITE(  •  .  ’  (  "PLEASE  INPIT  A  NEW  VALIE.  "  )  ’  ) 

GOTO  35 
END  IF 

•  CALCULATING  AND  STORING  F.  THE  NC«MALIZED  PRC«/\BILITY  DENSITY  FINCTICW 

J  -  1 

DO  20  R  -  BOTTOM,  TOP .  STEP 
F(J)  -  0. 

DO  30  Ii-  1,N 

A  -  EXP(-0.5*LOG(R/X(I.1))**2/X(I,2)) 

F(J)  -  F(J)  +  A/SQRT(X(I.2)) 

30  COOTINUE 

F( J)  -  F( J)/(SQPT(2.*3. I416)*R)/FL0AT(N) 

Z(J)  -  R 

WRITE(2.102)  Z(J).  F(J) 

102  F0RMAT(2F15. 10) 

J  -  J  +  1 
20  CONTINUE 

•  NL^ERICALLY  CALCULATING  FHAT,  THE  CUMULATIVE  DISTRIBUTiaN  FUNCTION,  BY 

•  USING  SI\PSON’S  1/3  RUXE  OVER  2  SUEINTERYALS 

K  -  INT((TOP  -  BOTTOM)  /  STEP) 

S  »  ABS(Z(K.l)/K)/3. 

FHAT  -0.0 

DO  40  I-  I,  K-2,  2 

FHAT  »  FHAT  +  (S-(F( I  )  +  ( 4«F( I  +  l ) )  +  F(  1+2 | ) ) 

WRITE(3,102)  Z(I).  FRAT 
40  CONTINUE 
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